A study on the Financial Health of Engagement, Ohio, USA.
Engagement, Ohio, USA is a small town with huge potential and experiencing sudden growth. We aim to analyse the data and derive insights which will help to plan the budget utilization wisely and to develop the infrastructure of the town to keep up with the growth.
For our Visual Analytics Project we aim to tackle the task 3 of the Vast Challenge.
Challenge 3:
Economic considers the financial health of the city. Over time, are businesses growing or shrinking? How are people changing jobs? Are standards of living improving or declining over time?
Consider the financial status of Engagement’s businesses and residents, and use visual analytic techniques to address these questions.
Problem 1: * Over the period covered by the dataset, which businesses appear to be more prosperous? Which appear to be struggling? Describe your rationale for your answers. Limit your response to 10 images and 500 words.
How does the financial health of the residents change over the period covered by the dataset? How do wages compare to the overall cost of living in Engagement? Are there groups that appear to exhibit similar patterns? Describe your rationale for your answers. Limit your response to 10 images and 500 words.
Describe the health of the various employers within the city limits. What employment patterns do you observe? Do you notice any areas of particularly high or low turnover? Limit your response to 10 images and 500 words.
packages = c('tidyverse','ggdist','gghalves','ggthemes','hrbrthemes','ggridges','patchwork','zoo', 'ggrepel','ggiraph','lubridate','gganimate','scales', 'plotly','treemap')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
Problem 1: (Shachi) Over the period covered by the dataset, which businesses appear to be more prosperous? Which appear to be struggling? Describe your rationale for your answers. Limit your response to 10 images and 500 words.
Links to the dataset:
CheckinJournal.csv
TravelJournal.csv
The code chunk below imports Restaurants.csv and
TravelJournal.csv and Pubs.csv from the data folder
into R by using read_csv()
function of readr and saves it as Tibble data frame
called travel and restaurants and pubs
restaurants <- read_csv("data/Restaurants.csv")
summary(restaurants)
restaurantId foodCost maxOccupancy location
Min. : 445.0 Min. :4.070 Min. : 48.00 Length:20
1st Qu.: 783.5 1st Qu.:4.395 1st Qu.: 71.75 Class :character
Median :1122.0 Median :5.095 Median : 85.00 Mode :character
Mean :1123.5 Mean :5.035 Mean : 84.70
3rd Qu.:1462.0 3rd Qu.:5.678 3rd Qu.:103.25
Max. :1805.0 Max. :5.920 Max. :119.00
buildingId
Min. : 27.0
1st Qu.:151.0
Median :294.5
Mean :432.9
3rd Qu.:754.2
Max. :991.0
pubs <- read_csv("data/Pubs.csv")
summary(pubs)
pubId hourlyCost maxOccupancy location
Min. : 442 Min. : 6.417 Min. :60.00 Length:12
1st Qu.: 780 1st Qu.: 9.725 1st Qu.:64.00 Class :character
Median :1118 Median :11.035 Median :69.50 Mode :character
Mean :1120 Mean :10.866 Mean :71.83
3rd Qu.:1458 3rd Qu.:12.379 3rd Qu.:77.50
Max. :1800 Max. :14.840 Max. :96.00
buildingId
Min. : 29.0
1st Qu.: 237.0
Median : 495.5
Mean : 484.8
3rd Qu.: 595.5
Max. :1012.0
travel <- read_csv("data/TravelJournal.csv")
summary(restaurants)
The Travel Journal contains financial transactions by a participant towards Work/Home Commute, Eating, Coming Back From Restaurant,Recreation (Social Gathering), Going Back to Home. We filter out the records related to Eating and Recreation (Social Gathering).
Calculating the total amount spent at the location as a difference of the starting balance and ending balance in the travel journal
travel_filt$amountSpent <- travel_filt$startingBalance -travel_filt$endingBalance
saveRDS(travel_filt, 'data/travel_filt.rds')
# A tibble: 6 × 11
participantId travelStartTime travelStartLocationId
<dbl> <dttm> <dbl>
1 23 2022-03-01 05:20:00 532
2 876 2022-03-01 05:50:00 NA
3 902 2022-03-01 06:05:00 NA
4 919 2022-03-01 06:00:00 NA
5 154 2022-03-01 05:55:00 NA
6 509 2022-03-01 06:00:00 NA
# … with 8 more variables: travelEndTime <dttm>,
# travelEndLocationId <dbl>, purpose <chr>, checkInTime <dttm>,
# checkOutTime <dttm>, startingBalance <dbl>, endingBalance <dbl>,
# amountSpent <dbl>
Grouping the data by the travelEndLocationId which is equal to the restaurant ID or the pub ID.
travel_group = travel_filt %>%group_by(travelEndLocationId) %>%
summarise(amountSpent = sum(amountSpent),
.groups = 'drop')%>%
arrange(desc(amountSpent))
library(dplyr)
data_merge <-merge(x=travel_group, y=restaurants, by.x = 'travelEndLocationId', by.y = 'restaurantId')
data_merge$travelEndLocationId <- as.character(data_merge$travelEndLocationId)
data_merge$amountSpent <- data_merge$amountSpent/1000
The restaurants highlighted in red are among those which had a revenue less than 50 thousand dollars over the period of time
library(plotly)
color = c('rgba(222,45,38,0.8)','rgba(204,204,204,1)','rgba(204,204,204,1)', 'rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)', 'rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(222,45,38,0.8)','rgba(222,45,38,0.8)','rgba(222,45,38,0.8)',
'rgba(222,45,38,0.8)','rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)')
fig<- plot_ly(data_merge, x = ~reorder(`travelEndLocationId`, -amountSpent), y = ~amountSpent, type = 'bar', marker = list(color = color))
fig <- fig %>% layout(title = "Revenue for Restraunts",
xaxis = list(title = "Retaurant ID"),
yaxis = list(title = "Revenue (in thousands $)"))
fig
The restaurants highlighted in red are among those which had a revenue less than 300 thousand dollars over the period of time
data_pub <-merge(x=travel_group, y=pubs, by.x = "travelEndLocationId", by.y = "pubId")
data_pub$travelEndLocationId <- as.character(data_pub$travelEndLocationId)
data_pub$amountSpent <- data_pub$amountSpent/1000
library(plotly)
color = c('rgba(222,45,38,0.8)','rgba(222,45,38,0.8)','rgba(222,45,38,0.8)', 'rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)','rgba(204,204,204,1)', 'rgba(204,204,204,1)', 'rgba(204,204,204,1)', 'rgba(204,204,204,1)')
fig<- plot_ly(data_pub, x = ~reorder(`travelEndLocationId`, -amountSpent), y = ~amountSpent, type = 'bar', marker = list(color = color))
fig <- fig %>% layout(title = "Revenue for Pubs",
xaxis = list(title = "Pub ID"),
yaxis = list(title = "Revenue (in thousands $)"))
fig
treemap(data_merge,
index=c("travelEndLocationId"),
vSize="amountSpent",
vColor="amountSpent",
title="Amount Spent in thousands of Dollars - Restaurants",
title.legend = "Amount Spent in thousands of Dollars - Restaurants"
)
data_travel= travel_filt %>%
mutate(weekday = weekdays(checkInTime),
day = day(checkInTime),
month=as.character(checkInTime,"%b %y"),
year = year(checkInTime),
monthYear = floor_date(checkInTime, "month"),
travelEndLocationId=as.character(travelEndLocationId),
timeSpent = checkOutTime - checkInTime,
participantId=as.character(participantId),
purpose=as.character(purpose))
data_travel$timeSpent <- as.numeric(as.character(data_travel$timeSpent))
data_travel <- data_travel[,c("participantId","travelStartLocationId", "travelEndLocationId", "purpose", "amountSpent","timeSpent","weekday","day","month","year","monthYear")]
group_pub <-merge(x=data_travel, y=pubs, by.x = "travelEndLocationId", by.y = "pubId")
group_restaurant<-merge(x=data_travel, y=restaurants, by.x = 'travelEndLocationId', by.y = 'restaurantId')
p<- ggplot(group_pub, aes(x=month, y=amountSpent, group=travelEndLocationId)) +
geom_line(aes(color=travelEndLocationId),show.legend = TRUE)+
labs(
y= 'Revenue (Thousands$)',
x= 'months -2022',
title = "Revenue Pubs - 2022",
caption = "Ohio USA"
) +
theme_minimal()+
theme(axis.ticks.x= element_blank(),
panel.background= element_blank(),
legend.background = element_blank(),
plot.title = element_text(size=12, face="bold",hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0),
axis.title.y= element_text(angle=0))
ggplotly(p)
p<- ggplot(group_restaurant, aes(x=month, y=amountSpent, group=travelEndLocationId)) +
geom_line(aes(color=travelEndLocationId),show.legend = TRUE)+
labs(
y= 'Revenue (Thousands$)',
x= 'months -2022',
title = "Revenue Restaurants- 2022",
caption = "Ohio USA"
) +
theme_minimal()+
theme(axis.ticks.x= element_blank(),
panel.background= element_blank(),
legend.background = element_blank(),
plot.title = element_text(size=12, face="bold",hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0),
axis.title.y= element_text(angle=0))
ggplotly(p)
`
Problem 2: (Rakendu) How does the financial health of the residents change over the period covered by the dataset? How do wages compare to the overall cost of living in Engagement? Are there groups that appear to exhibit similar patterns? Describe your rationale for your answers. Limit your response to 10 images and 500 words.
The Financial Journal of the participants was used to derive insights about the financial health of the residents. Let us take a look at the data :
We use the dplyr package to group by participant id and date from timestamp to find the income and expenditure of the participants. The code can be seen here
We will read in the wrangled data saved as rds file in order reduce the size
participant_fin <- read_rds("data/rds/participant_fin.rds")
We can use a scatterplot to understand the variations in income vs expenses of participants over time.
participant_fin %>%
filter(date >= 'Apr 2022') %>%
transform(date = as.Date(date, frac = 1)) %>%
ggplot(aes(x=income, y = abs(expense), size = savings, color = educationLevel))+
geom_point(alpha=0.7) +
ggtitle("Income vs Expense by different Education Levels") +
ylab("Expense") +
xlab("Income")+
theme_minimal() +
theme(axis.line = element_line(size = 0.5),
axis.text = element_text(size = 16),
axis.title = element_text(size=16),
axis.title.y = element_text(angle = 0),
legend.title = element_text(size =16),
legend.text = element_text(size = 16),
plot.title = element_text(size =20,hjust = 0.5))+
labs(title ='Period : {frame_time}')+
transition_time(date)+
ease_aes('linear')